Automatic Learning of Parallel Dependency Treelet Pairs

نویسندگان

  • Yuan Ding
  • Martha Palmer
چکیده

Induction of synchronous grammars from empirical data has long been a problem unsolved; despite that generative synchronous grammars theoretically suit the machine translation task very well. This fact is mainly due to pervasive structural divergences between languages. This paper presents a statistical approach to learn dependency structure mappings from parallel corpora. The algorithm introduced in this paper extends the dependency tree word alignment algorithm in (Ding, 2003). The new algorithm automatically learns parallel dependency treelet pairs from loosely matched non-isomorphic dependency trees while keeping computational complexity polynomial in the length of the sentences. A set of heuristics are introduced and specifically optimized for the parallel treelet learning purpose using Minimum Error Rate training. As learning parallel syntactic structures is the key step in the automatic learning of a synchronous grammar, the learnt parallel dependency treelet pairs by our approach serve as an important first step of any lexicalized synchronous grammar induction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dependency Treelet Translation: Syntactically Informed Phrasal SMT

We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, e...

متن کامل

Dependency Tree Translation: Syntactically Informed Phrasal SMT

We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only target language resource assumed is a word breaker. These are used to produce treelet (“phrase”) translation pairs as well as several m...

متن کامل

Description of KYOTO EBMT System in PatentMT at NTCIR-10

This paper describes“KYOTO”EBMT system that attended PatentMT at NTCIR-10. When translating very different language pairs such as Japanese-English, it is very important to handle sentences in tree structures to overcome the difference. Many of recent studies incorporate tree structures in some parts of translation process, but not all the way from model training (parallel sentence alignment) to...

متن کامل

A Dependency Treelet String Correspondence Model for Statistical Machine Translation

This paper describes a novel model using dependency structures on the source side for syntax-based statistical machine translation: Dependency Treelet String Correspondence Model (DTSC). The DTSC model maps source dependency structures to target strings. In this model translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and...

متن کامل

Learning Method for Automatic Acquisition of Translation Knowledge

This paper presents a new learning method for automatic acquisition of translation knowledge from parallel corpora. We apply this learning method to automatic extraction of bilingual word pairs from parallel corpora. In general, similarity measures are used to extract bilingual word pairs from parallel corpora. However, similarity measures are insufficient because of the sparse data problem. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004